Search CORE

125 research outputs found

ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

Networks are powerful data structures, but are challenging to work with for conventional machine learning methods. Network Embedding (NE) methods attempt to resolve this by learning vector representations for the nodes, for subsequent use in downstream machine learning tasks. Link Prediction (LP) is one such downstream machine learning task that is an important use case and popular benchmark for NE methods. Unfortunately, while NE methods perform exceedingly well at this task, they are lacking in transparency as compared to simpler LP approaches. We introduce ExplaiNE, an approach to offer counterfactual explanations for NE-based LP methods, by identifying existing links in the network that explain the predicted links. ExplaiNE is applicable to a broad class of NE algorithms. An extensive empirical evaluation for the NE method `Conditional Network Embedding' in particular demonstrates its accuracy and scalability

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Quantifying and minimizing risk of conflict in social networks

Author: Chen Xi
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Controversy, disagreement, conflict, polarization and opinion divergence in social networks have been the subject of much recent research. In particular, researchers have addressed the question of how such concepts can be quantified given people’s prior opinions, and how they can be optimized by influencing the opinion of a small number of people or by editing the network’s connectivity. Here, rather than optimizing such concepts given a specific set of prior opinions, we study whether they can be optimized in the average case and in the worst case over all sets of prior opinions. In particular, we derive the worst-case and average-case conflict risk of networks, and we propose algorithms for optimizing these. For some measures of conflict, these are non-convex optimization problems with many local minima. We provide a theoretical and empirical analysis of the nature of some of these local minima, and show how they are related to existing organizational structures. Empirical results show how a small number of edits quickly decreases its conflict risk, both average-case and worst-case. Furthermore, it shows that minimizing average-case conflict risk often does not reduce worst-case conflict risk. Minimizing worst-case conflict risk on the other hand, while computationally more challenging, is generally effective at minimizing both worst-case as well as average-case conflict risk

Crossref

Ghent University Academic Bibliography

Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?

Author: De Bie Tijl
Lijffijt Jefrey
Mara Alexandru
Publication venue
Publication date: 01/01/2020
Field of study

Network embedding methods map a network's nodes to vectors in an embedding space, in such a way that these representations are useful for estimating some notion of similarity or proximity between pairs of nodes in the network. The quality of these node representations is then showcased through results of downstream prediction tasks. Commonly used benchmark tasks such as link prediction, however, present complex evaluation pipelines and an abundance of design choices. This, together with a lack of standardized evaluation setups can obscure the real progress in the field. In this paper, we aim to shed light on the state-of-the-art of network embedding methods for link prediction and show, using a consistent evaluation pipeline, that only thin progress has been made over the last years. The newly conducted benchmark that we present here, including 17 embedding methods, also shows that many approaches are outperformed even by simple heuristics. Finally, we argue that standardized evaluation tools can repair this situation and boost future progress in this field

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Conditional network embeddings

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

Network Embeddings (NEs) map the nodes of a given network into

d

-dimensional Euclidean space

\mathbb{R}^d

. Ideally, this mapping is such that 'similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if 'similar' means being 'more likely to be connected') or classification (if 'similar' means 'being more likely to have the same label'). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization

Ghent University Academic Bibliography

Subjectively interesting connecting trees

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

Computational methods for comparison and exploration of event sequences

Author: Lijffijt Jefrey
Publication venue: Aalto-yliopisto
Publication date: 01/01/2013
Field of study

Many types of data, e.g., natural language texts, biological sequences, or time series of sensor data, contain sequential structure. Analysis of such sequential structure is interesting for various reasons, for example, to detect that data consists of several homogeneous parts, that data contains certain recurring patterns, or to find parts that are different or surprising compared to the rest of the data. The main question studied in this thesis is how to identify global and local patterns in event sequences. Within this broad topic, we study several subproblems. The first problem that we address is how to compare event frequencies across event sequences and databases of event sequences. Such comparisons are relevant, for example, to linguists who are interested in comparing word counts between two corpora to identify linguistic differences, e.g., between groups of speakers, or language change over time. The second problem that we address is how to find areas in an event sequence where an event has a surprisingly high or low frequency. More specifically, we study how to take into account the multiple testing problem when looking for local frequency deviations in event sequences. Many algorithms for finding local patterns in event sequences require that the person applying the algorithm chooses the level of granularity at which the algorithm operates, and it is often not clear how to choose that level. The third problem that we address is which granularities to use when looking for local patterns in an event sequence. The main contributions of this thesis are computational methods that can be used to compare and explore (databases of) event sequences with high computational efficiency, increased accuracy, and that offer new perspectives on the sequential structure of data. Furthermore, we illustrate how the proposed methods can be applied to solve practical data analysis tasks, and describe several experiments and case studies where the methods are applied on various types of data. The primary focus is on natural language texts, but we also study DNA sequences and sensor data. We find that the methods work well in practice and that they can efficiently uncover various types of interesting patterns in the data

Ghent University Academic Bibliography

Aaltodoc Publication Archive

ALPINE : Active Link Prediction using Network Embedding

Author: Chen Xi
De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2020
Field of study

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, consumer-product recommendations, and the identification of hidden interactions between actors in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, the link status of a node pair can be queried, which can be used as additional information by the link prediction algorithm. Unfortunately, such queries can be expensive or time-consuming, mandating the careful consideration of which node pairs to query. In this paper we estimate the improvement in link prediction accuracy after querying any particular node pair, to use in an active learning setup. Specifically, we propose ALPINE (Active Link Prediction usIng Network Embedding), the first method to achieve this for link prediction based on network embedding. To this end, we generalized the notion of V-optimality from experimental design to this setting, as well as more basic active learning heuristics originally developed in standard classification settings. Empirical results on real data show that ALPINE is scalable, and boosts link prediction accuracy with far fewer queries

Ghent University Academic Bibliography

Explainable subgraphs with surprising densities : a subgroup discovery approach

Author: De Bie Tijl
Deng Junning
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph. We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Opinion dynamics with backfire effect and biased assimilation

Author: Chen Xi
De Bie Tijl
Lijffijt Jefrey
Tsaparas Panayiotis
Publication venue
Publication date: 01/01/2019
Field of study

The democratization of AI tools for content generation, combined with unrestricted access to mass media for all (e.g. through microblogging and social media), makes it increasingly hard for people to distinguish fact from fiction. This raises the question of how individual opinions evolve in such a networked environment without grounding in a known reality. The dominant approach to studying this problem uses simple models from the social sciences on how individuals change their opinions when exposed to their social neighborhood, and applies them on large social networks. We propose a novel model that incorporates two known social phenomena: (i) Biased Assimilation: the tendency of individuals to adopt other opinions if they are similar to their own; (ii) Backfire Effect: the fact that an opposite opinion may further entrench someone in their stance, making their opinion more extreme instead of moderating it. To the best of our knowledge this is the first DeGroot-type opinion formation model that captures the Backfire Effect. A thorough theoretical and empirical analysis of the proposed model reveals intuitive conditions for polarization and consensus to exist, as well as the properties of the resulting opinions

Ghent University Academic Bibliography

Direct mining of subjectively interesting relational patterns

Author: Aknin Achille
De Bie Tijl
Guns Tias
Lijffijt Jefrey
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible

Crossref

Ghent University Academic Bibliography